Neural volumetric representations have become a widely adopted model for radiance fields in 3D scenes. These representations are fully implicit or hybrid function approximators of the instantaneous volumetric radiance in a scene, which are typically learned from multi-view captures of the scene. We investigate the new task of neural volume super-resolution - rendering high-resolution views corresponding to a scene captured at low resolution. To this end, we propose a neural super-resolution network that operates directly on the volumetric representation of the scene. This approach allows us to exploit an advantage of operating in the volumetric domain, namely the ability to guarantee consistent super-resolution across different viewing directions. To realize our method, we devise a novel 3D representation that hinges on multiple 2D feature planes. This allows us to super-resolve the 3D scene representation by applying 2D convolutional networks on the 2D feature planes. We validate the proposed method's capability of super-resolving multi-view consistent views both quantitatively and qualitatively on a diverse set of unseen 3D scenes, demonstrating a significant advantage over existing approaches.
translated by 谷歌翻译
灵感来自最近隐含地代表具有训练有素的神经网络的信号的进步,我们旨在学习窄基线4D光场的连续表示。我们提出了一个用于4D光字段的隐式表示模型,其在稀疏的输入视图上被调节。我们的模型受过培训,以输出连续范围的查询时空坐标的光场值。鉴于稀疏的输入视图集,我们的方案可以通过灵活的因素超级解决空间和角域中的输入。由一个特征提取器和解码器组成,该解码器在光场补丁的数据集上培训。特征提取器从输入视图捕获每个像素特征。这些特征可以调整为所需的空间分辨率,并与查询坐标一起馈送到解码器。该配方使我们能够以任何期望的空间和角度分辨率重建光场视图。此外,我们的网络可以处理输入视图的场景,其中输入视图是低分辨率或缺失像素。实验表明,我们的方法在计算快速的同时实现了视图综合任务的最先进的性能。
translated by 谷歌翻译
In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.
translated by 谷歌翻译
In recent years, reinforcement learning (RL) has become increasingly successful in its application to science and the process of scientific discovery in general. However, while RL algorithms learn to solve increasingly complex problems, interpreting the solutions they provide becomes ever more challenging. In this work, we gain insights into an RL agent's learned behavior through a post-hoc analysis based on sequence mining and clustering. Specifically, frequent and compact subroutines, used by the agent to solve a given task, are distilled as gadgets and then grouped by various metrics. This process of gadget discovery develops in three stages: First, we use an RL agent to generate data, then, we employ a mining algorithm to extract gadgets and finally, the obtained gadgets are grouped by a density-based clustering algorithm. We demonstrate our method by applying it to two quantum-inspired RL environments. First, we consider simulated quantum optics experiments for the design of high-dimensional multipartite entangled states where the algorithm finds gadgets that correspond to modern interferometer setups. Second, we consider a circuit-based quantum computing environment where the algorithm discovers various gadgets for quantum information processing, such as quantum teleportation. This approach for analyzing the policy of a learned agent is agent and environment agnostic and can yield interesting insights into any agent's policy.
translated by 谷歌翻译
Machine learning methods like neural networks are extremely successful and popular in a variety of applications, however, they come at substantial computational costs, accompanied by high energy demands. In contrast, hardware capabilities are limited and there is evidence that technology scaling is stuttering, therefore, new approaches to meet the performance demands of increasingly complex model architectures are required. As an unsafe optimization, noisy computations are more energy efficient, and given a fixed power budget also more time efficient. However, any kind of unsafe optimization requires counter measures to ensure functionally correct results. This work considers noisy computations in an abstract form, and gears to understand the implications of such noise on the accuracy of neural-network-based classifiers as an exemplary workload. We propose a methodology called "Walking Noise" that allows to assess the robustness of different layers of deep architectures by means of a so-called "midpoint noise level" metric. We then investigate the implications of additive and multiplicative noise for different classification tasks and model architectures, with and without batch normalization. While noisy training significantly increases robustness for both noise types, we observe a clear trend to increase weights and thus increase the signal-to-noise ratio for additive noise injection. For the multiplicative case, we find that some networks, with suitably simple tasks, automatically learn an internal binary representation, hence becoming extremely robust. Overall this work proposes a method to measure the layer-specific robustness and shares first insights on how networks learn to compensate injected noise, and thus, contributes to understand robustness against noisy computations.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
深度强化学习(DRL)是一种仅从演示和经验中学习机器人控制政策的有前途的方法。为了涵盖机器人的整个动态行为,DRL训练是通常在仿真环境中得出的主动探索过程。尽管这种模拟培训廉价且快速,但将DRL算法应用于现实世界的设置很困难。如果对代理进行训练直到它们在模拟中安全执行,则由于模拟动力学和物理机器人之间的差异引起的SIM到真实差距,将其传输到物理系统很困难。在本文中,我们提出了一种在线培训DRL代理的方法,可以使用基于模型的安全主管在实体车辆上自动驾驶。我们的解决方案使用监督系统检查代理选择的操作是安全还是不安全,并确保在车辆上始终采取安全措施。这样,我们可以在安全,快速,有效地训练DRL算法的同时绕过SIM到现实的问题。我们提供各种现实世界实验,在线培训一辆小型实体车辆,可以自动驾驶,没有事先模拟培训。评估结果表明,我们的方法在未崩溃的同时提高了样品效率的训练代理,并且受过训练的代理比在模拟中训练的代理表现出更好的驾驶性能。
translated by 谷歌翻译
对抗性补丁攻击是现实世界深度学习应用程序的新兴安全威胁。我们提出了戴定的平滑,这是第一种(符合我们的知识),以证明语义分割模型与此威胁模型的鲁棒性。以前关于防御补丁攻击的辩护的工作主要集中在图像分类任务上,并且经常需要更改模型体系结构和其他培训,而这些培训是不受欢迎且计算上昂贵的。在被删除的平滑度中,可以在没有特定培训,微调或限制体系结构的情况下应用任何分割模型。使用不同的掩盖策略,可以将拔掉的平滑措施应用于认证检测和认证恢复。在广泛的实验中,我们表明,在检测任务中,平均可以证明1%补丁的像素预测的64%,而在ADE20K数据集中恢复任务的0.5%贴片为48%。
translated by 谷歌翻译
语言的视觉基础旨在用多种视觉知识来源(例如图像和视频)丰富语言表示。尽管视觉接地是一个深入研究的领域,但视觉接地的语言方面并没有得到太多关注。本研究调查了单词嵌入的语法视觉基础。我们在两个视觉和语言空间之间提出了一种隐式对齐技术,其中语言之间的文本信息相互作用以丰富预训练的文本单词嵌入。我们专注于实验中的三种语言,即英语,阿拉伯语和德语。我们获得了这些语言的视觉接地矢量表示形式,并研究了一种或多种语言的视觉接地是否改善了嵌入在单词相似性和分类基准上的嵌入性能。我们的实验表明,语法知识可以改善类似语言(例如德语和英语)的扎根嵌入性能。但是,德语或英语用阿拉伯语的语言基础导致单词相似性基准的性能略有降解。另一方面,我们观察到了分类基准的相反趋势,而阿拉伯语对英语的进步最大。在讨论部分中,提出了这些发现的几个原因。我们希望我们的实验为进一步研究的基线提供了有关语法间视觉接地的基准。
translated by 谷歌翻译
为了为视频产生适当的标题,推理需要确定相关的概念并注意它们之间的空间关系以及剪辑中的时间发展。我们的端到端编码器视频字幕框架结合了两个基于变压器的体系结构,这是一种用于单个关节时空视频分析的改编变压器,以及用于高级文本生成的基于自我注意力的解码器。此外,我们引入了一种自适应框架选择方案,以减少所需的传入帧数,同时在训练两个变压器时保持相关内容。此外,我们通过汇总每个样本的所有基础真理标题来估计与视频字幕相关的语义概念。我们的方法在MSVD以及大规模的MSR-VTT和VATEX基准数据集上实现了最新的结果,并考虑了多个自然语言产生(NLG)指标。对多样性得分的其他评估突出了我们生成的标题结构的表现力和多样性。
translated by 谷歌翻译